Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate robust weights for points excluded by UV-cut. #225

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

JSKenyon
Copy link
Collaborator

@landmanbester Would appreciate your thoughts here. Current behaviour results in weights of zero for points excluded by the UV-cut. This has repercussions for MAD flagging as points with zero weight will end up flagged (whitened residuals will be zero). These changes will result in robust weights being produced for points excluded by the UV-cut. My intuition is that this is sensible - we are applying the gains to those points after all.

@landmanbester
Copy link
Collaborator

I think this is about the most sensible thing you can do. Also shows why flagging on the whitened residuals is the way to go. If you flag on just the residuals when there is unmodelled flux present you run the risk of i) biasing your MAD estimates and ii) flagging unmodelled flux. You may also want to have an option where points within the baseline cut are not considered during MAD flagging (I am thinking of the case where you smoove (not a typo) the gains and flag on the whitened residuals using the original weights)

@JSKenyon
Copy link
Collaborator Author

Excluding points in the UV-cut during MAD flagging was the cause of some of @o-smirnov's problems. If a baseline is excluded, it will not have a mad estimate at which point it becomes a little unclear what to do with it i.e. you either have to flag it entirely (this is incorrect and what was previously happening), or ignore it which means that then the flagging on the short baselines may be bad. For now I think that ignoring the UV-cut in both the reweighting and the MAD flagging is sensible. After all, even if there is bad data there it will not affect the gains and ultimately we probably do want to flag/downweigh bad data. All of this is up for debate though - we can modify as needed.

@o-smirnov
Copy link
Collaborator

For now I think that ignoring the UV-cut in both the reweighting and the MAD flagging is sensible.

Hmm I think this is slightly different from the CC madmax behaviour, which has a separate residual flagging round at the end (enabled by --madmax-residuals) that flags all residuals, including those excluded by the cut (since these are computed anyway, as they should be).

But now that I think about it, there are use cases for both behaviours:

  • I routinely use a mild uv-cut (100m or so) to avoid contamination in the gains from leftover RFI on short baselines. However in this case I still want to do MAD flagging on those short baselines.

  • If the image was constructed with a uv-cut/inner taper (as we now try to do for Sun-contaminated images), the model on short spacings is invalid. But now I don't want to do MAD flagging on them, because it would presumably flag an excessive amount of data. @Victoria-Samboco's solar imaging pipeline, for example, should operate in this mode, because she's going to rephase and image the Sun later.

@JSKenyon
Copy link
Collaborator Author

* If the image was constructed with a uv-cut/inner taper (as we now try to do for Sun-contaminated images), the model on short spacings is invalid. But now I _don't_ want to do MAD flagging on them, because it would presumably flag an excessive amount of data. @Victoria-Samboco's solar imaging pipeline, for example, should operate in this mode, because she's going to rephase and image the Sun later.

Interesting. I could add options for finer control over this behaviour. However, my instinct is that this is a dangerous regime to be using the MAD flagger in anyway. Fundamentally, the MAD flagger assumes that the statistics of the whitened residual (although this may need to be the whitened corrected residuals now that I think about it) are Gaussian. This assumption is only truly valid once our model is complete and our data is adequately calibrated. Prior to that, the residuals are actually student-t with an unknown DOF parameter. This means that we need to be very careful when MAD flagging (particularly if we have no weights/our weights are incorrect) or else we can introduce weird biases which can manifest as ghosts. It should still catch gross outliers but using it at the outset will almost definitely cause problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants